152 research outputs found

    Unobtrusive and pervasive video-based eye-gaze tracking

    Get PDF
    Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe

    What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?

    Full text link
    In neural image captioning systems, a recurrent neural network (RNN) is typically viewed as the primary `generation' component. This view suggests that the image features should be `injected' into the RNN. This is in fact the dominant view in the literature. Alternatively, the RNN can instead be viewed as only encoding the previously generated words. This view suggests that the RNN should only be used to encode linguistic features and that only the final representation should be `merged' with the image features at a later stage. This paper compares these two architectures. We find that, in general, late merging outperforms injection, suggesting that RNNs are better viewed as encoders, rather than generators.Comment: Appears in: Proceedings of the 10th International Conference on Natural Language Generation (INLG'17

    Investigating user preferences in utilizing a 2D paper or 3D sketch based interface for creating 3D virtual models

    Get PDF
    Computer modelling of 2D drawings is becoming increasingly popular in modern design as can be witnessed in the shift of modern computer modelling applications from software requiring specialised training to ones targeted for the general consumer market. Despite this, traditional sketching is still prevalent in design, particularly so in the early design stages. Thus, research trends in computer-aided modelling focus on the the development of sketch based interfaces that are as natural as possible. In this report, we present a hybrid sketch based interface which allows the user to make draw sketches using offline as well as online sketching modalities, displaying the 3D models in an immersive setup, thus linking the object interaction possible through immersive modelling to the flexibility allowed by paper-based sketching. The interface was evaluated in a user study which shows that such a hybrid system can be considered as having pragmatic and hedonic value.peer-reviewe

    On-screen point-of-regard estimation under natural head movement for a computer with integrated webcam

    Get PDF
    Recent developments in the field of eye-gaze tracking by vidoeoculography indicate a growing interest towards unobtrusive tracking in real-life scenarios, a new paradigm referred to as pervasive eye-gaze tracking. Among the challenges associated with this paradigm, the capability of a tracking platform to integrate well into devices with in-built imaging hardware and to permit natural head movement during tracking is of importance in less constrained scenarios. The work presented in this paper builds on our earlier work, which addressed the problem of estimating on-screen point-of-regard from iris center movements captured by an integrated camera inside a notebook computer, by proposing a method to approximate the head movements in conjunction with the iris movements in order to alleviate the requirement for a stationary head pose. Following iris localization by an appearance-based method, linear mapping functions for the iris and head movement are computed during a brief calibration procedure permitting the image information to be mapped to a point-of-regard on the monitor screen. Following the calculation of the point-of-regard as a function of the iris and head movement, separate Kalman filters improve upon the noisy point-of-regard estimates to smoothen the trajectory of the mouse cursor on the monitor screen. Quantitative and qualitative results obtained from two validation procedures reveal an improvement in the estimation accuracy under natural head movement, over our previous results achieved from earlier work.peer-reviewe

    Where to put the image in an image caption generator

    Get PDF
    When a neural language model is used for caption generation, the image information can be fed to the neural network either by directly in- corporating it in a recurrent neural network { conditioning the language model by injecting image features { or in a layer following the recurrent neural network { conditioning the language model by merging the image features. While merging implies that visual features are bound at the end of the caption generation process, injecting can bind the visual features at a variety stages. In this paper we empirically show that late binding is superior to early binding in terms of di erent evaluation metrics. This suggests that the di erent modalities (visual and linguistic) for caption generation should not be jointly encoded by the RNN; rather, the multi- modal integration should be delayed to a subsequent stage. Furthermore, this suggests that recurrent neural networks should not be viewed as actu- ally generating text, but only as encoding it for prediction in a subsequent layer.peer-reviewe

    Bimodal automated carotid ultrasound segmentation using geometrically constrained deep neural networks

    Get PDF
    For asymptomatic patients suffering from carotid stenosis, the assessment of plaque morphology is an important clinical task which allows monitoring of the risk of plaque rupture and future incidents of stroke. Ultrasound Imaging provides a safe and non-invasive modality for this, and the segmentation of media-adventitia boundaries and lumen-intima boundaries of the Carotid artery form an essential part in this monitoring process. In this paper, we propose a novel Deep Neural Network as a fully automated segmentation tool, and its application in delineating both the media-adventitia boundary and the lumen-intima boundary. We develop a new geometrically constrained objective function as part of the Network's Stochastic Gradient Descent optimisation, thus tuning it to the problem at hand. Furthermore, we also apply a bimodal fusion of amplitude and phase congruency data proposed by us in previous work, as an input to the network, as the latter provides an intensity-invariant data source to the network. We finally report the segmentation performance of the network on transverse sections of the carotid. Tests are carried out on an augmented dataset of 81,000 images, and the results are compared to other studies by reporting the DICE coefficient of similarity, modified Hausdorff Distance, sensitivity and specificity. Our proposed modification is shown to yield improved results on the standard network over this larger dataset, with the advantage of it being fully automated. We conclude that Deep Neural Networks provide a reliable trained manner in which carotid ultrasound images may be automatically segmented, using amplitude data and intensity invariant phase congruency maps as a data source

    Using switching multiple models for the automatic detection of spindles

    Get PDF
    Sleep EEG data is characterised by various events that allow for the identification of the different sleep stages. Stage 2 in particular is characterised by two morphologically distinct waveforms, specifically spindles and K-complexes. Manual scoring of these events is time consuming and risks being subjectively interpreted; hence there is the need of robust automatic detection techniques. Various approaches have been adopted in the literature, ranging from period-amplitude analysis, to spectral analysis and autoregressive modelling. Most of the adopted techniques follow an episodic approach where the goal is to identify whether an epoch of EEG data contains an event, such as a spindle, or otherwise. The disadvantage of this approach is that it requires the data to be segmented into epochs, risking that an event falls at an epoch boundary, and it has low temporal resolution. This work proposes the use of an autoregressive switching multiple model for the automatic segmentation and labelling of Stage 2 sleep EEG data characterised by spindles and K-complexes. When this modelling technique was used to identify spindles from background EEG, quantitative results based on a sample by sample basis gave a sensitivity score between 72.39% to 87.51%, depending to which scorer performance was compared. This score corresponds to a specificity that ranges between 78.89% and 90.55% and which increases to a range between 75.52% and 94.64% when performance is measured on an event basis instead [1]. This performance compares well with other spindle detection techniques published in the literature [2,3]. The advantage of the proposed technique is that it allows for the continuous segmentation of EEG data, it offers a unified framework to detect multiple events with little training data, and it can also be extended to a semi-supervised approach. The latter, which has also been applied to Stage 2 sleep EEG data, can identify new states in real time, providing a solution that not only replaces the time consuming manual scoring process but it may also provide the clinician with new insights on the data that is being analysed.peer-reviewe

    Parametric Modelling of EEG Data for the Identification of Mental Tasks

    Get PDF
    Electroencephalographic (EEG) data is widely used as a biosignal for the identification of different mental states in the human brain. EEG signals can be captured by relatively inexpensive equipment and acquisition procedures are non-invasive and not overly complicated. On the negative side, EEG signals are characterized by low signal-to-noise ratio and non-stationary characteristics, which makes the processing of such signals for the extraction of useful information a challenging task.peer-reviewe

    An investigation on server-side object-scene recognition performance using coarse location information and camera phone-captured images

    Get PDF
    This paper presents a solution based on information already residing within a mobile network and aimed at the cultural tourist. It also demonstrates how scene (or landmark) recognition from an image can be achieved by combining local invariant image features, cell location information and classification based on Self-Organizing Map clustering. The proposed server-side approach makes the solution independent of the mobile platform and thus accessible to any camera- embedded mobile station having the Multimedia Messaging Service enabled.peer-reviewe
    • …
    corecore